"Trust me on this" Explaining Agent Behavior to a Human Terminator

Abstract

Consider a setting where a pre-trained agent is operating in an environmentand a human operator can decide to temporarily terminate its operation andtake-over for some duration of time. These kind of scenarios are common inhuman-machine interactions, for example in autonomous driving, factoryautomation and healthcare. In these settings, we typically observe a trade-offbetween two extreme cases -- if no take-overs are allowed, then the agent mightemploy a sub-optimal, possibly dangerous policy. Alternatively, if there aretoo many take-overs, then the human has no confidence in the agent, greatlylimiting its usefulness. In this paper, we formalize this setup and propose anexplainability scheme to help optimize the number of human interventions.

Quick Read (beta)

loading the full paper ...